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(54) Interrupt processing during iterative instruction execution 



(57) A processing engine comprises an instruction 
decode register adapted to buffer a complete instruction 
pending decode thereof, and a decode mechanism con- 
figured to decode instructions. The processing engine 



is responsive to a repeat instruction to repeat execution 
of a subsequent instruction and further responsive to an 
interrupt signal to interrupt said repeat execution ot said 
subsequent instruction. 
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Description 

FIELD OF INVENTION 

[0001] The present invention relates to processing 
engines, and in particular but not exclusively, to 
processing engines configured to repeat program flow 

BACKGROUND OF INVENTION 

[0002] Many different types of processing engine are 
known, of which microprocessors are but one example. 
For example, Digital Signal Processors (DSPs) are 
widely used, in particular for specific applications. DSPs 
are typically configured to optimise the performance of 
the applications concerned and to achieve this they em- 
ploy more specialised execution units and instruction 
sets. They have not, however, been provided with the 
parallel instruction execution architectures found in mi- 
croprocessors and do not operate under the operating 
systems used to control such microprocessors. 
[0003] In a DSP or microprocessor for example, ma- 
chine-readable instructions stored in a program memory 
are sequentially executed by the processing engine in 
order for the processing engine to perform operations 
or functions. The sequence of machine-readable in- 
structions is termed a "program". Although the program 
instructions are typically performed sequentially, certain 
instructions permit the program sequence to be broken, 
and for the program flow to "Branch 0 or "Jump" to an- 
other part of the program, where sequential program 
flow re-starts. Often the later program sequence in- 
cludes an instruction which "Returns" the program flow 
to the original program sequence. 
[0004] The program portion terminated by the "Re- 
turn" is often termed a "sub-routine", and Branching or 
Jumping to a " sub-routine" is referred to as 'Calling 0 a 
sub-routine. In this way, often used functions or tasks 
do not need to be included throughout a program, but 
instead may be coded once as a sub-routine and called 
from the main program as necessary. 
[0005] Sub-routines are often called in response to an 
interrupt request received by the processing engine in 
order to service the interrupt. 
[0006] It is often also desirable to repeat instructions 
within a program. Such repetition is known as "looping" 
or "iteration*. In particular instances it is desirable to re- 
peat a single instruction using a repeal loop. 
[0007] In known processing engines instruction reg- 
isters are only sixteen bits long. Long instruction words 
taking multiple processor cycles to execute cannot com- 
pletely fit into such instruction registers. In order to de- 
code and execute them, the long instruction is split into 
sections of 1 6 bits or less and decoded and executed 
separately. Typically, the separate parts of the instruc- 
tion are passed to different registers within the process- 
ing engine. For example, an op-code may be passed to 
an instruction decode register from where the op-code 



is decoded,, whilst an operand for the op-code may be 
passed to another register such as a data register or 
accumulator. 

[0008] Such long instructions can be repeated as a 
5 single instruction repeat. However, the op-code need 
only be decoded once, ie during the first pass of the loop, 
whilst the operand is up-dated as appropriate for each 
pass of the loop. 

[0009] On receiving an interrupt request, the process- 
to ing engine needs to save a lot of information regarding 
the location, status and current value of the separate 
sections of the repeated instruction to a stack register 
or the like, in order to reliably return from servicing the 
interrupt. The volume of data that it is necessary to store 
is for conventional processing engines having short in- 
struction registers is such so as to make the processing 
and storage overhead uneconomic or unviable. Thus, 
servicing of interrupts is inhibited during a single instruc- 
tion repeat loop. 
20 [0010] However, it is necessary to service interrupts 
within a reasonable time period after they occur. Hence, 
in known processing engines a repeat loop is set-up in 
sections of 10 or 100 repeats for example. Gaps be- 
tween each repeat section provide an opportunity for the 
2S processing engine to service any interrupts that may 
have occurred during the repeat section 
[0011] The present invention is directed to improving 
the performance of processing engine such as, for ex- 
ample but not exclusively, digital signal processors. 

30 

SUMMARY OF INVENTION 

[0012] Particular and preferred aspects of the inven- 
tion are set out in the accompanying independent and 
35 dependent claims. Combinations of features from the 
dependent claims may be combined with features of the 
independent claims as appropriate and not merely as 
explicitly set out in the claims. 
[0013] In accordance with a first aspect of the inven- 
40 tion there is provided a processing engine comprising 
an instruction decode register adapted to buffer a com- 
plete instruction pending decode thereof. The process- 
ing engine also comprises a decode mechanism config- 
ured to decode instructions. The processing engine is 
45 responsive to a repeat instruction to repeat the execu- 
tion of a subsequent instruction, and is further respon- 
sive to a signal to interrupt said repeat execution of said 
subsequent instruction. 

[0014] In a second aspect in accordance with the in- 
so vention, there is provided a method for operating a 
processing engine including an instruction decode reg- 
ister. The method comprises (i) repeatedly executing an 
instruction wholly disposed in said instruction register, 
(ii) interrupting said second repeated execution, (iii) 
55 loading said instruction register with another instruction, 
and (iv) executing said another insTruction. 
[001 S] An advantage of a preferred embodiment in 
accordance with the first or second aspects of the in- 
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yention is that interruption of a repeat loop for a single 
instruction may occur without the need to store large 
amounts of context information. Additionally, complete 
or whole instructions fit into the instruction decode reg- 
ister thereby facilitating repeat execution of single in- 
structions comprising multi-cycle instructions or two in- 
structions executable in parallel, for example. Such a 
multi-cycle instruction may for example comprise a long 
offset modifier (16 bits), absolute address (e.g. 24 bits) 
or a single memory instruction to add a constant to the 
contents of the memory location and store the result in 
the same memory location. This would comprise a 
memory read cycle first, and on the next cycle execution 
of the arithmetic operation and writing the result to the 
single memory location. This would only take two cycles 
since there is no need to re-compute the address. 
[001 6] A further advantage of preferred embodiments 
in accordance with the first or second aspects of the in- 
vention is that there is no need to limit the size of the 
repeat loop, in order to provide for opportunities to serv- 
ice interrupt requests. Additionally, they reduce interrupt 
service latency. 

[0017] Preferably the processing engine is respon- 
sive to the signal to interrupt the decode mechanism, 
but is configured to complete a current decode operation 
prior to disablement. 

[001 8] Typically a processing engine comprises a pro- 
gram counter for pointing to a program memory location 
corresponding to a next instruction to be decoded. Such 
processing engines also provide a memory area for stor- 
ing a value located in said program counter during an 
interrupt. The same memory, or a different memory, is 
usually provided for storing context information associ- 
ated with the next or subsequent instruction. This pro- 
vides for storage of relevant information such as the lo- 
cation in program memory of the next instruction during 
the interrupt in orderto facilitate return to the appropriate 
part in the program sequence once the interrupt has 
been serviced. 

[0019] It is desirable that the value of the program 
counter and context information can be stored quickly, 
and to this end processing engines generally provide a 
register, often embedded on a processing core for the 
processing engine, to which such information may be 
stored. Preferably, the total number of bits of context da- 
ta and program counter data do nt exceed the data 
bandwidth of the stack register, thus enabling context/ 
program counter data to be pused to the stack in a single 
processor cycle. 

[0020] There is generally provided interrupt control 
mechanisms which are responsive to an interrupt re- 
quest signal to initiate signals for interrupting the repeat 
execution. Such interrupt control mechanisms are used 
for setting an interrupt flag within the processing engine 
to indicate that an interrupt request has been received. 
Additionally, the interrupt control mechanisms typically 
determine the nature of an interrupt request and arrange 
for calling a corresponding program sub-routine to serv- 



ice the interrupt. 

[0021] "Suitably, embodiments of the invention are 
within integrated circuits, typically within a digital signal 
processor or the like. This is particularly suitable appli- 
s cations for inclusion within electronic apparatus, and in 
the case of digital signal processors in telecommunica- 
tions apparatus. 

[0022] In accordance with the preferred embodiments 
of the invention, it is possible to relatively easily interrupt 

10 the single repeat processes compared to the prior art, 
where such interrupts would have to be separately cod- 
ed. Consequently, there is no need to halt repeats to 
determine if any interrupts have occurred during the re- 
peat loop which reduces processor cycles for perform- 

15 ing such loops. Thus, there is a corresponding reduction 
in power consumption by the processing engine, and in 
the code size. Therefore, embodiments of the invention 
are particularly suitable for use in portable apparatus, 
such as wireless communication devices. Typically, 

20 such a wireless communication device comprises a user 
interface including a display such as liquid crystal dis- 
play or a TFT display, and a keypad or keyboard for in- 
putting data to the communications device. Additionally, 
a wireless communication device will also comprise a 

25 transceiver and an antenna for wireless communication 
with a radio telephone network or the like. 
[0023] Further aspects and advantages of the inven- 
tion will become apparentfrom thefollowing description. 

30 BRIEF DESCRIPTION OF THE DRAWINGS 

[0024] Particular embodiments in accordance with 
the invention will now be described, by way of example 
only, and with reference to the accompanying drawings 
35 in which like reference signs are used to denote like 
parts unless otherwise stated, and in which: 

Figure 1 is a schematic block diagram of a proces- 
sor in accordance with an embodiment of the inven- 
40 tion; 

Figure 2 is a schematic diagram of a core of the 
processor of Figure 1; 

Figure 3 is a more detailed schematic biockdiagram 
of various execution units of the core of the proces- 
45 sor of Figure 1 ; 

Figure 4 is schematic diagram of an instruction buff- 
er queue and an instruction decoder controller of 
the processor of Figure 1 ; 

Figure 5 is a representation of pipeline phases of 
50 the processor of Figure 1 ; 

Figure 6 is a diagrammatic illustration of an example 
of operation of a pipeline in the processor of Figure 
1; 

Figure 7 is a schematic representation of the core 
ss of the processor for explaining the operation of the 
pipeline of the processor of Figure 1 ; Memory Inter- 
face Unit \ 

Figure 8 is a schematic illustration of a known 
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processing engine; 

Figure 9 is a schematic illustration of a preferred 
embodiment of the invention; 
Figure 10 is a schematic illustration of memory or- 
ganised for stiring interrupt sub-routine addresses 
and interrupt sub-routines; 
Figure 11 diagrammatically illustrates the (a) pipe- 
line stages of an interrupted single repeat instruc- 
tion in accordance with an embodiment of the in- 
vention, and (b) register contents corresponding to 
the pipeline stages of an interrupted single repeated 
instruction and synchronised to an interrupt signal, 
in accordance with an embodiment of the invention; 
Figure 12 diagramatically illustrates the (a) pipeline 
stages of a return from interrupt of a single repeat 
instruction in accordance with an embodiment of 
the invention and (b) register contents correspond- 
ing to the pipeline stagesof the return from interrupt, 
and 

Figure 13 is a schematic illustration of a wireless 
communication device suitable for including an em- 
bodiment of the invention. 

DESCRIPTION OF PARTICULAR EMBODIMENTS 

[0025] Although the invention finds particular applica- 
tion to Digital Signal Processors (DSPs), implemented 
for example in an Application Specific Integrated Circuit 
(ASIC) it also finds application to other forms of process- 
ing engines. 

[0026] The basic architecture of an example of a proc- 
essor according to the invention will now be described. 
[0027] Figure 1 is a schematic overview of a proces- 
sor 1 0 forming an exemplary embodiment of the present 
invention. The processor 10 includes a processing en- 
gine 100 and a processor backplane 20. In the present 
embodiment, the processor is a Digital Signal Processor 
10 implemented in an Application Specific Integrated 
Circuit (ASIC). 

[0028] As shown in Figure 1 , the processing engine 
100 forms a central processing unit (CPU) with a 
processing core 102 and a memory interface, or man- 
agement, unit 104 for interfacing the processing core 
102 with memory units external to the processor core 
102. 

[0029] The processor backplane 20 comprises a 
backplane bus 22, to which the memory management 
unit 104 of the processing engine is connected. Also 
connected to the backplane bus 22 is an instruction 
cache memory 24, peripheral devices 26 and an exter- 
nal interface 28. 

[0030] It will be appreciated that in other embodi- 
ments, the invention could be implemented using differ- 
ent configurations and/or different technologies. For ex- 
ample, the processing engine 100 could form the proc- 
essor 10, with the processor backplane 20 being sepa- 
rate therefrom, The processing engine 100 could, for ex- 
ample be a DSP separate from and mounted on a back- 



plane 20 supporting a backplane bus 22, peripheral and 
external interfaces. The processing engine 100 could, 
for example, be a microprocessor rather than a DSP and 
could be implemented in technologies other than ASIC 

5 technology. The processing engine, or a processor in- 
cluding the processing engine, could be implemented in 
one or more integrated circuits. 
[0031] Figure 2 illustrates the basic structure of an 
embodiment of the processing core 102. As illustrated, 

10 the processing core 1 02 includes four elements, namely 
an Instruction Buffer Unit (I Unit) 106 and three execu- 
tion units. The execution units are a Program Flow Unit 
(P Unit) 108, Address Data Flow Unit (A Unit) 110 and 
a Data Computation Unit (D Unit) 112 for executing in- 

75 structions decoded from the Instruction Buffer Unit (I 
Unit) 106 and for controlling and monitoring program 
flow. 

[0032] Figure 3 illustrates the P Unit 108, A Unit 110 
and D Unit 1 1 2 of the processing core 1 02 in more detail 

20 and shows the bus structure connecting the various el- 
ements of the processing core 102. The P Unit 108 in- 
cludes, for example, loop control circuitry, GoTc/Branch 
control circuitry and various registers for controlling and 
monitoring program flow such as repeat counter regis- 

25 ters and interrupt mask, flag or vector registers. The P 
Unit 108 is coupled to general purpose Data Write 
busses (EB, FB) 1 30, 1 32, Data Read busses (CB : DB) 
1 34, 1 36 and a coefficient program bus (BB) 1 38. Addi- 
tionally, the P Unit 108 is coupled to sub-units within the 

30 A Unit 110 and D Unit 112 via various busses labeled 
CSR, ACBand RGD. 

[0033] As illustrated in Figure 3, in the present em- 
bodiment the A Unit 110 includes a register file 30, a 
data address generation sub-unit (DAGEN) 32 and an 

os Arithmetic and Logic Unit (ALU) 34. The A Unit register 
file 30 includes various registers, among which are 16 
bit pointer registers (AR0, .... AR7) and data registers 
(DR0, .... DR3) which may also be used for data flow as 
well as address generation. Additionally, the register file 

40 includes 16 bit circular buffer registers and 7 bit data 
page registers. As well as the general purpose busses 
(EB, FB, CB, DB) 130, 132, 134, 136, a coefficient data 
bus 140 and a coefficient address bus 142 are coupled 
to the A Unit register file 30. The A Unit register file 30 

4& is coupled to the A Unit DAGEN unit 32 by unidirectional 
busses 144 and 146 respectively operating in opposite 
directions. The DAGEN unit 32 includes 16 bit X/Y reg- 
isters and coefficient and stack pointer registers, for ex- 
ample for controlling and monitoring address generation 

so within the processing engine 1 00. 

[0034] The A Unit 110 also comprises the ALU 34 
which includes a shifter function as well as the functions 
typically associated with an ALU such as addition, sub- 
traction, and AND, OR and XOR logical operators. The 

55 ALU 34 is also coupled to the general-purpose busses 
(EB, DB) 130, 1 36 and an instrucTEon constant data bus 
(KDB) 140. The A Unit ALU is coupled to the P Unit 108 
by a PDA bus for receiving register content from the P 
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Unit 108 register file. The ALU 34 is also coupled to the 
A Unit register file 30 by busses RGA and RGB for re- 
ceiving address and data register contents and by a bus 
RGD for forwarding address and data registers in the 
register file 30. 

[0035] As illustrated, the D Unit 112 includes a D Unit 
register file 36, a D Unit ALU 38, a D Unit shifter 40 and 
two multiply and accumulate units (MAC1, MAC2) 42 
and 44. The D Unit register file 36, D Unit ALU 38 and 
D Unit shifter 40 are coupled to busses (EB, FB, CB, DB 
and KDB) 130, 132, 134, 136 and 140, and the MAC 
units 42 and 44 are coupled to the busses (CB, DB, 
KDB) 1 34, 1 36, 1 40 and data read bus (BB) 144. The D 
Unit register file 36 includes 40-bit accumulators 
( AC0, . . . AC3) and a 1 6-bit transition register. The D Unit 
112 can also utilize the 16 bit pointer and data registers 
in the A Unit 110 as source or destination registers in 
addition to the 40-bit accumulators. The D Unit register 
file 36 receives data from the D Unit ALU 38 and MACs 
1&2 42, 44 over accumulator write busses (ACWO, 
ACW1) 146, 148, and from the D Unit shifter 40 over 
accumulator write bus (ACW1) 148. Data is read from 
the D Unit register file accumulators to the D Unit ALU 
38. D Unit shifter 40 and MACs 1&2 42, 44 over accu- 
mulator read busses (ACRO, ACR1) 150, 152. The D 
Unit ALU 38 and D Unit shifter 40 are also coupled to 
sub-units of the A Unit 108 via various busses labeled 
EFC, DRB, DR2 and ACB. 

[0036] Referring now to Figure 4, there is illustrated 
an instruction buffer unit 106 comprising a 32 word in- 
struction buffer queue (IBQ) 502. The IBQ 502 compris- 
es 32X16 bit registers 504, logically divided into 8 bit 
bytes 506. Instructions arrive at the IBQ 502 via the 
32-bit program bus (PB) 122. The instructions are 
fetched in a 32-bit cycle into the location pointed to by 
the Local Write Program Counter (LWPC) 532. The 
LWPC 532 is contained in a register located in the P Unit 
108. The P Unit 108 also includes the Local Read Pro- 
gram Counter (LRPC) 536 register, and the Write Pro- 
gram Counter (WPC) 530 and Read Program Counter 
(RPC) 534 registers. LRPC 536 points to the location in 
the IBQ 502 of the next instruction or instructions to be 
loaded into the instruction decoder(s) 51 2 and 51 4. That 
is to say, the LRPC 534 points to the location in the IBQ 
502 of the instruction currently being dispatched to the 
decoders 512, 514. The WPC points to the address in 
program memory of the start of the next 4 bytes of in- 
struction code for the pipeline. For each fetch into the 
IBQ, the next 4 bytes from the program memory are 
fetched regardless of instruction boundaries. The RPC 
534 points to the address in program memory of the in- 
struction currently being dispatched to the decoders) 
512 and 514. 

[0037] The instructions are formed into a 48-bit word 
and are loaded into the instruction decoders 512, 514 
over a 48-bit bus 516 via multiplexors 520 and 521. It 
will be apparent to a person of ordinary skill in the art 
that the instructions may be formed into words compris- 



ing other than 48-bits, and that the present invention is 
not limited to the specific embodiment described above. 
[0038] The bus 516 can load a maximum of two in- 
structions, one per decoder, during any one instruction 

5 cycle. The combination of instructions may be in any 
combination of formats, 8, 16, 24, 32, 40 and 48 bits, 
which will fit across the 48-bit bus. Decoder 1, 512, is 
loaded in preference to decoder 2, 514, if only one in- 
struction can be loaded during a cycle. The respective 

10 instructions are then forwarded on to the respective 
function units in order to execute them and to access 
the data for which the instruction or operation is to be 
performed. Prior to being passed to the instruction de- 
coders, the instructions are aligned on byte boundaries. 

*5 The alignment is done based on the format derived for 
the previous instruction during decoding thereof. The 
multiplexing associated with the alignment of instruc- 
tions with byte boundaries is performed in multiplexors 
520 and 521. 

20 [0039] The processor core 1 02 executes instructions 
through a 7 stage pipeline, the respective stages of 
which will now be described with reference to Figure 5. 
[0040] The first stage of the pipeline is a PRE-FETCH 
(P0) stage 202, during which stage a next program 

25 memory location is addressed by asserting an address 
on the address bus (PAB) 1 1 8 of a memory interface, or 
memory management unit 1 04. 
[0041] In the next stage, FETCH(P1) stage 204, the 
program memory is read and the I Unit 106 is filled via 

30 the PB bus 1 22 from the memory management unit 1 04. 
[0042] The PRE-FETCH and FETCH stages are sep- 
arate from the rest of the pipeline stages in that the pipe- 
line can be interrupted during the PRE-FETCH and 
FETCH stages to break the sequential program flow and 

35 point to other instructions in the program memory, for 
example for a Branch instruction. 
[0043] The next instruction in the instruction buffer is 
then dispatched to the decoder/s 512/514 in the third 
stage, DECODE (P2) 206, where the instruction is de- 

40 coded and dispatched to the execution unitfor executing 
that instruction, for example to the P Unit 108, the A Unit 
110 or the D Unit 112. The decode stage 206 includes 
decoding at least part of an instruction including a first 
part indicating the class of the instruction, a second part 

45 indicating the format of the instruction and a third part 
indicating an addressing mode for the instruction. 
[0044] The next stage is an ADDRESS (P3) stage 
208, in which the address of the data to be used in the 
instruction is computed, or a new program address is 

so computed should the instruction require a program 
branch or jump. Respective computations take place in 
the A Unit 110 or the P Unit 108 respectively. 
[004S] In an ACCESS (P4) stage 210 the address of 
a read operand is output and the memory operand, the 

55 address of which has been generated in a DAGEN X 
operator with an Xmem indirect addressing mode, is 
then READ from indirectly addressed X memory 
(Xmem). 
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[0046] The next stage of the pipeline is the READ (P5) 
stage 21 2 in which a memory operand, the address of 
which has been generated in a DAG EN Y operator with 
an Ymem indirect addressing mode or in a DAGEN C 
operator with coefficient address mode, is READ. The s 
address of the memory location to which the result of 
the instruction is to be written is output. 
[0047] In the case of dual access, read operands can 
also be generated in the Y path, and write operands in 
the X path. 10 
[0048] Finally, there is an execution EXEC (P6) stage 
214 in which the instruction is executed in either the A 
Unit 110 or the D Unit 112 The result is then stored in 
a data register or accumulator, or written to memory for 
Read/Modify/Write or store instructions. Additionally, is 
shift operations are performed on data in accumulators 
during the EXEC stage. 

[0049] The basic principle of operation for a pipeline 
processor will now be described with reference to Figure 
6. As can be seen from Figure 6, for a first instruction 20 
302, the successive pipeline stages take place over time 
periods T r T 7 . Each time period is a clock cycle for the 
processor machine clock. A second instruction 304. can 
enter the pipeline in period T 2 , since the previous in- 
struction has now moved on to the next pipeline stage. 2s 
For instruction 3, 306, the PRE -FETCH stage 202 oc- 
curs in time period T 3 . As can be seen from Figure 6 for 
a seven stage pipeline a total of 7 instructions may be 
processed simultaneously. For all 7 instructions 
302-31 4, Figure 6 shows them all under process in time 30 
period T 7 . Such a structure adds a form of parallelism 
to the processing of instructions. 
[0050] As shown in Figure 7, the present embodiment 
of the invention includes a memory management unit 
1 04 which is coupled to external memory units via a 24 35 
bit address bus 11 4 and a bi-directional 1 6 bit data bus 
116. Additionally, the memory management unit 104 is 
coupled to program storage memory (not shown) via a 
24 bit address bus 118 and a 32 bit bi-directional data 
bus 1 20. The memory management unit 1 04 is also cou- 40 
pled to the I Unit 106 of the machine processor core 102 
via a 32 bit program read bus (PB) 122. The P Unit 108, 
A Unit 110 and D Unit 112 are coupled to the memory 
management unit 104 via data read and data write 
busses and corresponding address busses. The P Unit 4S 
108 is further coupled to a program address bus 128. 
[0051] More particularly, the P Unit 108 is coupled to 
the memory management unit 104 by a 24 bit program 
address bus 128, the two 16 bit data write busses (EB, 
FB) 1 30, 1 32, and the two 1 6 bit data read busses (CB, so 
DB) 1 34, 1 36. The A Unit 110 is coupled to the memory 
management unit 104 via two 24 bit data write address 
busses (EAB, FAB) 160, 162, the two 16 bit data write 
busses (EB, FB) 1 30, 132, the three data read address 
busses (BAB, CAB, DAB) 1 64, 1 66, 1 68 and the two 1 6 ss 
bit data read busses (CB, DB) 134, 136. The D Unit 112 
is coupled to the memory management unit 104 via the 
two data write busses (EB, FB) 1 30, 1 32 and three data 



read busses (BB, CB, DB) 144. 134, 136. " 
[0052] * - Figure 7 represents the passing of instructions 
front the I Unit 1 06 to the P Unit 1 08 at 1 24. for forward- 
ing branch instructions for example. Additionally. Figure 
7 represents the passing of data from the I Unit 106 to 
the A Unit 110 and the D Unit 112 at 126 and 128 re- 
spectively. 

[0053] By way of background example, a known 
processing engine will now be described. The process- 
ing engine comprises a program memory 802 for storing 
instructions for execution by the processing engine. The 
instructions are stored at addressable locations 804 in 
the program memory 802. The program memory 802 is 
addressed over a Program Address Bus (PAB) 806. In- 
structions at locations 804 addressed over the PAB 806 
are placed on the Program Bus (PB) 808. The address- 
es for the program memory 802 are asserted on the PAB 
806 from a program counter 81 0. The program counter 
810 is initiated with the address of the first instruction 
for a program stored in the program memory 802, and 
then generally increases sequentially to address the in- 
structions in the program memory forwarding to func- 
tional units of the processing engine for execution. 
[0054] The program counter 810 also 'points 0 to an 
instruction in program memory 802 placed on the PB 
808 as shown by dotted arrow 81 2. Thus, by inspection 
of the program counter 810 it is possible to determine 
from which addressable location 804 a next instruction 
for the functional units of the processing engine will orig- 
inate. 

[0055] Instructions from the PB 808 are loaded into 
an Instruction Register 814 pending decode thereof by 
a Decoder 816. Decoded instructions are forwarded 
from the Decoder 816 to Execution Units 818. 
[0056] During execution of program instructions by 
the processing engine, it is possible that an instruction 
to repeat a following single instruction will be received 
and executed by the processing engine. Such an in- 
struction will "freeze" the program counter 81 0 at the ad- 
dress of the single instruction, so that the program coun- 
ter points to the location of that instruction for the dura- 
tion of the repeat. Thus that single instruction will be 
placed on PB 808 for each processor cycle, and repeat- 
edly decoded and executed. Such repeating is known 
as single instruction looping. 

[0057] An example of known processing engine con- 
figured to perform single instruction repeat loops will 
now be described with reference to Fig 8. 
[0058] In known processing engine, the instruction 
register is relatively small, for example 16 bits long in 
the Texas Instruments TMS 320C54X DSP integrated 
circuit chip. Consequently, long instruction words com- 
prising more than 16 bits, typically requiring more than 
one processor cycle to load them into the instruction reg- 
ister 81 4, decode them in the decoder 81 6, and execute 
them 818, are split in order to fit iriTb the instruction reg- 
ister 814. Accordingly, respective parts of a long multi- 
cycle instruction word are decoded and executed sep- 
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arately. During sequential program flow decoding and 
executing of long instruction words via the 16 bit register 
occurs without undue problems. This is even the case 
when a single instruction repeat loop comprises a long 
multi-instructional word. 

[0059] It is desirable during operation of processing 
engine to interrupt a repeat loop, for example when the 
processing engine receives an interrupt request from 
peripheral apparatus or in respect of a bus error. Since 
a long instruction word is split into parts for decoding 
and execution due to the limited size of the instruction 
register, it is necessary to save a great deal of context 
information in order for the program flow, i.e. single in- 
struction repeat, to return to the correct part of the long 
multi-cycle instruction on return from the interrupt sub- 
routine when the interrupt has been serviced. Conse- 
quently, for known processing engine interruption of sin- 
gle word repeat loops is not supported, even for sixteen 
bit instruction words which would wholly fit within the in- 
struction buffer. This is in order to reduce complexity and 
programming rules for the processing engine. 
[0080] Not supporting the interruption of single in- 
struction repeat loops has further drawbacks. Since fail- 
ure to service interrupt requests within a defined latency 
period may result in a catastrophic failure of the process- 
ing engine, or any system in which the processing en- 
gine resides, it is necessary to ensure that any single 
instruction repeat loop is broken after a certain number 
of instructions, for example 20, 50 or 100 repeats, in or- 
der to allow for the servicing of any interrupt request that 
may have occurred during the repeat loop. The repeat 
loop can then be continued once the pause to service 
any interrupt requests, and any interrupt requests have 
been serviced, is over This puts a further processing 
overhead on the processing engine, or upon a program- 
mer for the processing engine to ensure that such 
breaks are included in single instruction repeat loops. 
[0061] A preferred embodiment of the invention will 
now be described with reference to Fig. 9. 
[0062] An instruction buffer queue (IBQ) 502 is shown 
having instructions input thereto via program bus 122. 
Instruction code from the program bus 122 are written 
into the IBQ 502 at locations pointed to by LWPC 532. 
Instructions at locations pointed to by LRPC 536 are 
loaded from IBQ 502 into multiplexors 520,521 de- 
scribed with reference to Fig. 4. A long instruction such 
as a 48 bit instruction can be extracted from the IBQ 502 
and may comprise whole or complete multi-cycle in- 
struction words such as parallel operable instruction 
words, memory read/write instructions where data is 
read on a first cycle and a computation and data write 
takes place on a second cycle, and single memory in- 
structions with a large offset for example. The IBQ 502 
therefore acts as a kind of instruction decode register 
for that long instruction for example. Instruction words 
from the multiplexors 520/521 are decoded in decoders 
51 2 and/or 51 4 and the decoded instructions forwarded 
to appropriate execution units 902, such as the P Unit 



108, A Unit 110 and D Unit 112 described above with 
reference to Figs 2, 3 and 7. 

[0063] In response to the execution of an instruction 
for repeating a single instruction, a so-called °single-re- 

s peat" instruction, a flag in a Repeat Flag Register (RP- 
TF) 904 is set Additionally, the Repeat Count Register 
(RPTC) 906 is initialised and loaded with the initial re- 
peat count value i.e. the number of repeats to be per- 
formed. The single-repeat instruction includes the initial 

10 repeat count value, and the RPTF 904 and RPTC 906 
are located in a repeat loop controller 908. The repeat 
loop controller 908 also includes compare logic 910, 
which compares the value in RPTC 906 with zero, and 
decrement logic 912 for decrementing the value held in 

is RPTC 906 for the value not being equal to zero. 

[0064] Whilst the repeat flag is set, LRPC 536 and 
RPC 534 are not updated. Thus, the instruction at the 
location in the IBQ 502 pointed to by LRPC 536, and 
having a program memory location pointed to by RPC 

20 534, is repeatedly forwarded to the multiplexors 520/521 
for decode and subsequently through the rest of the 
processor pipeline stages for each repeat. During the 
decode stage, P2 referring to Figure 5, for the repeated 
instruction the value stored in RPTC 906 is decrement- 
's ed by 1. As described above, once an instruction has 
entered the decode stage it has to continue through the 
rest of the pipeline stages and be fully executed. The 
decremented value is compared in compare logic 910 
andthe repeat flag is updated for the decremented value 

30 not equal to zero. 

[0065] Whilst the repeat flag is set and LRPC and 
RPC are frozen, instructions can be accumulated within 
the IBQ 502, for subsequent execution once the repeat 
loop has been completed. This aspect of the preferred 

35 embodiment advantageously reduces delay in execut- 
ing instructions following the repeated single instruc- 
tions since they have already been fetched ("pre-fetch") 
into the IQB 502 from program memory, and allows 
greater parallelism of instructions after the repeat. 

40 [0066] The preferred embodiment also includes inter- 
rupt management logic 914. Interrupts, IT, received by 
the processing engine are routed to the interrupt man- 
agement logic 914. The interrupt management logic 914 
includes an arbitration unit 916 which determines the 

45 priority of interrupts when more than one interrupt is re- 
ceived. For each interrupt an appropriate flag in the in- 
terrupt flag register 918 is set, which although logically 
is part of the interrupt management logic is physically 
located in the P Unit 108 in the preferred embodiment. 

so The interrupt management logic 914 also comprises a 
single bit Jam Register (JMR) 920. 
[0067] On the processor cycle following receipt of an 
interrupt or interrupts the appropriate interrupt flag (IFR) 
918 is set, and if the interrupt global mask is enabled i. 

55 e. servicing of interrupts is enabled, the JAM bit in the 
JMR 920 is set. Setting the JAM bit causes the dispatch 
of instructions to the decoder 512, 514 to be disabled 
The interrupt control logic 914 takes control and stops 



13 



EP 0 992 689 A1 



14 



the repeat loop, shown 940 in Fig. 9. Loop information, 
shown collectively as RPT 922 and including the repeat 
flag RPTF 904 and other context information, is saved 
to the Context Flow Control Register (CFCT) 924 locat- 
ed in the P Unit 108. RPTC 906 is a memory mapped 
register and hence the current value may be stored in 
memory. 

[0068] The RPC 534 is saved to the Local Copy Read 
Program Counter (LCRPC) 926. Both CFCT 924 and 
LCRPC 926 serve to pipeline the context information 
and RPC to the stack register 928. In a preferred em- 
bodiment of the invention, the size of the context infor- 
mation and program counter address are limited to a to- 
tal of 32 bits in order to provide for a single cycle save, 
since the databus is 32 bus wide. 
[0069] Once the repeat flag has been saved it is 
cleared by the interrupt control logic 914. Thus. RPC 
534 and LRPC 536 are free. Once the JAM bit is set in 
J MR 920, the interrupt management logic 914 sets the 
WPC 530 to the interrupt vector The interrupt vector is 
exported directly to the program address bus PAB 128 
and addresses an area (interrupt vectot table) of pro- 
gram memory, for example starting at FFFF00, which 
contains interrupt sub-routine addresses as shown in 
Fig 10. The interrupt sub-routine addresses @ ISA0...3 
respectively point to further areas of program memory 
containing the interrupt sub-routines ISR0. .3. TTius, the 
interrupt vector address, resulting from the interrupt pri- 
ority arbitration, causes the start address of a corre- 
sponding interrupt sub-routine to be asserted on the 
PAB 128, and in WPC. The sub-routine instructions are 
then loaded into the IBQ 502 and enter the processor 
pipeline in the normal way for servicing the interrupt. 
[0070] On completion of the interrupt sub-routine, the 
repeat flag is restored from the stack 928 via CFCT 924 
to RPTF 904. This enables the repeat count circuitry. 
Due to memory access taking more than one processor 
cycle, the current repeat counter value is restored from 
memory to RPTC 906 prior to the return from interrupt 
On the return from interrupt the LRPC 536 is frozen and 
the WPC 530 and RPC 534 are restored from LCRPC 
926. The contents of RPC and the single instruction to 
be repeated is fetched from program memory to the IBQ 
502 and to the multiplexors 520/521 . Repeat of the sin- 
gle instruction, and fetch advance of subsequent in- 
structions, then continues as before the interrupt. 
[0071] The flow of instructions through a processing 
pipeline in accordance with the preferred embodiment 
of the invention will now be described with reference to 
Fig. 11 (a). The pipeline stages shown in Fig. 11 (a) will be 
referred to using the same nomenclature as used in Fig. 
5, namely P0 is the pre-fetch stage, P1 is the fetch 
stage, P2 is the decode stage, P3 is the address stage, 
P4 is the access stage, P5 is the read stage and P6 is 
the execute stage. 

[0072] A single repeated instruction I0 O is shown 1 1 02 
in the decode stage P2 of the processor pipeline and 
proceeds down the pipeline to the execute stage P6 



1104. The single repeated instruction undergoes the 
next repeat, I0 V starting in the decode stage 1106 and 
proceeding to the execute stage 1108. At some point 
after an interrupt occurs the dispatch from the decoder 
5 is disabled, shown I0/IT 1 1 1 0 in the pipeline grid of Fig. 
11. 

[0073] In the next processor cycle following disabling 
of the dispatch the interrupt vector address, @ ITV, is 
placed directly on the PAB 128 in the pre-fetch stage 

10 1112 of the processor pipeline. In the fetch stage, 1114, 
the contents of the address @ITV are read, which are 
the interrupt sub-routine address ©ISA, and asserted 
on PAB 128 'm the next processor cycle 1116. The first 
instruction, ISRXo, for the sub-routine located at 

is @ISAX is transferred directly to the decode stage 1118 
of the processor pipeline. Subsequently, the next in- 
struction ISRXj for the interrupt sub-routine enters the 
pipeline, and execution of the sub-routine continues in 
a conventional manner. On the return from interrupt, the 

20 repeat of the single instruction is restarted. 

[0074] Fig 1 1 (b) shows the respective signals and op- 
erations relative to the processor cycles and the instruc- 
tion pipeline shown in Fig 11 (a). 
[0075] The single instruction repeat is active as 

2S shown by 1 1 20. In processor cycle 1 1 22 an interrupt. IT 
1124, which is active low, is received by the processing 
engine. On the following cycle, 11 26, the appropriate in- 
terrupt flag is set in the I FR 91 8, and on the subsequent 
cycle, 1128, the JAM bit 1130 is set and the dispatch 

30 disabled. The RPTC 906 is decremented for each de- 
code of 10 in cycles 1122 and 1126, but is inhibited and 
frozen once the dispatch is disabled in cycle 11 28. How- 
ever, if the ISR is utilising a repeat loop then the contents 
of RPTC must be stored under control of user coded 

35 software for the ISR. 

[0076] The flow of instructions through a processing 
pipeline for aretum from interrupt in accordance with the 
preferred embodiment of the invention will now be de- 
scribed with reference to Fig. 12. The pipeline stages 

40 shown in Fig. 12(a) will be referred to using the same 
nomenclature as used in Fig.5 and with reference to Fig. 
11 above. 

[0077] In processor cycle 1 202 a return from interrupt 
instruction (RET) is decoded in the decode stage P2 of 

45 the pipeline, and RPC 534 contains the program mem- 
ory address of the RET instruction. The RET instruction 
causes WPC 530 to be loaded with the program memory 
address of the interrupted single repeated instruction 
@I0 from LCRPC 926 such that for the next processor 

so cycle 1 204 the pre-fetch stage P0 of the pipeline access- 
es program memory address @I0. For the following 
processor cycle 1206, the instruction 10 is placed on the 
PB 122 in the Fetch stage P1 and on the subsequent 
cycle 1208 the instruction is decoded in the decode 

ss stage P2 of the pipeline. During the decode stage P2, 
RPC 534 is updated with the program memory address 
@I0, and the repeat flag is set in the RPTF register 904 
having been restored from the CFCT register 924. As 
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referred to above, the current count value is restored to 
the RPTC register 906 prior to the return from interrupt, 
and during the access stage P3 of the pipeline the RPTC 
906 value is decremented. For the decode stage 92 of 
the first pass of the repeated single instruction after the s 
return from interrupt. The RPC 534 value is now frozen 
for the remainder of the repeat, and the repeat single 
instruction continues as before. 
[0078] An advantage of the preferred embodiment of 
the invention is that single instruction repeat loops can 10 
be interrupted without the need for storing large 
amounts of context information in order to coherently re- 
turn from an interrupt sub-routine. Thus, there is no need 
to limit repeat loop sizes in order to provide opportunity 
for the processing engine to service interrupts, and in- is 
terrupt latency is improved. 

[0079] Embodiments of the invention are suitable for 
wireless communication devices such as a radio tele- 
phone 50 illustrated in Fig. 12. Such a radiotelephone 
50 comprises a user interface 52 including a display 54 20 
such as a liquid crystal display, and a key pad 56. The 
radio telephone also includes a transceiver, not shown, 
and an antenna 56. 

[0080] In view of the foregoing description it will be 
evident to a person skilled in the art that various modi- 2s 
fications may be made within the scope of the invention. 
For example, a bit may be set in the decode mechanism 
having an active state indicating that the decode mech- 
anism is disabled. 

[0081] The scope of the present disclosure includes 30 
any novel feature or combination of features disclosed 
therein either explicitly or implicitly or any generalisation 
thereof irrespective of whether or not it relates to the 
claimed invention or mitigates any or all of the problems 
addressed by the present invention. The applicant here- 35 
by gives notice that new claims may be formulated to 
such features during the prosecution of this application 
or of any such further application derived therefrom. In 
particular, with reference to the appended claims, fea- 
tures from dependent claims may be combined with 40 
those of the independent claims and features of respec- 
tive independent claims may be combined in any appro- 
priate manner and not merely in the specific combina- 
tions enumerated in the claims. 



responsive to said signal to disable said decode 
mechanism. 

3. A processing engine according to claim 2, the de- 
code mechanism configurable to complete a current 
decode operation prior to disablement. 

4. A processing engine according to claim 2 or claim 
3, and further comprising an instruction pipeline in- 
cluding a plurality of pipeline stages, said process- 
ing engine configurable to disable said decode 
mechanism during a decode stage of said instruc- 
tion pipeline. 

5. A processing engine according to any of claims 
2 to 4, responsive to a decode flag in said decode 
mechanism to disable said decode mechanism. 

6. A processing engine according to any of claims 
2 to 4, responsive to a signal to inhibit dispatch of 
an instruction to said decode mechanism. 

7. A processing engine according to any of claims 
1 to 6, further comprising a program counter for 
pointing to a program memory location correspond- 
ing to said subsequent instruction, and a memory 
for storing a value for said program counter during 
said interrupt. 

8. A processing engine according to claim 7, con- 
figurable to freeze said program counter for said re- 
peat execution of said subsequent instruction. 

9. A processing engine according to any of claims 
1 to 8, configurable for storing context information 
associated with said second instruction in said 
memory. 

1 0. A processing engine, according to any of claims 
7 to 9, wherein said memory comprises a register. 

1 1 . A processing engine according to any preceding 
claim, further comprising an interrupt control mech- 
anism responsive to an interrupt request signal to 
initiate said signal to interrupt said repeat execution. 



Claims 

1. A processing engine, comprising an instruction 
decode register adapted to buffer a complete in- 
struction pending decode thereof, and a decode 
mechanism configured to decode instructions, said 
processing engine responsive to a repeat instruc- 
tion to repeat execution of a subsequent instruction 
and further responsive to a signal to interrupt said 
repeat execution ol said subsequent instruction. 

2. A processing engine according to claim 1 , further 



12. A processing engine according to claim 11, 
wherein said interrupt control mechanism is respon- 
sive to said interrupt request to set an interrupt flag. 

13. A processing engine according to claim 11 or 
claim 12, said interrupt control mechanism respon- 
sive to said interrupt request signal to initiate an in- 
terrupt service routine for determining the nature of 
said interrupt and calling a corresponding program 
sub-routine. ~ 

14. A processing engine according to any preceding 
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claim, wherein said instruction decode register is 
adapted to buffer instructions having a bit length 
from a range, 8, 16, 24, 32. 40 and 48 bits. 

15. A processing engine according to claim 14. $ 
wherein said instruction decode register is adapted 

to buffer multi-cycle instructions and/or an instruc- 
tion pair. 

16. A processing engine according to claim 7 or 8 10 
or any preceding claim dependent on claim 7 or 8, 
responsive to an end of said interrupt to restore said 
program counter value to said program counter 
from said memory. 

15 

17. Processing engine according to claim 16, re- 
sponsive to transfer said second instruction to said 
instruction decode register in accordance with said 
program counter value. 

20 

18. A processing engine according to claim 16 or 
1 7, further comprising a further program counter for 
pointing to a location in program memory corre- 
sponding to an instruction for transfer to said in- 
struction decode register, the processing engine re- 2$ 
sponsive to said end of said interrupt to load said 
further program counter with a value corresponding 

to said program counter value for transferring said 
second instruction from program memory to said in- 
struction decode register. 30 

19. Aprocessing engine according toany preceding 
claim and configured for context data and program 
counter data bit total to not greater than the bit size 

of a stack registerfor the processing engine. 35 

20. Aprocessor comprising a processing engine ac- 
cording to any of claims 1 to 19. 

21. An integrated circuit comprising a processing *o 
engine according to any of claims 1 to 1 9. 

22. A digital signal processor comprising a process- 
ing engine according to any of claims 1 to 19. 

45 

23. An integrated circuit comprising a digital signal 
processor according to claim 22. 

24. A method for operating a processing engine in- 
cluding an instruction decode register, the method so 
comprising 



25. A method according to claim 24, step (i) further 
comprising the steps of 

(v) decoding said instruction, and 

(vi) dispatching said decoded instruction for ex- 
ecuting. 

26. A method according to claim 25, further com- 
prising disabling said decoding for interrupting said 
repeated execution. 

27. A method according to claim 26, further com- 
prising completing a current decoding operation pri- 
or to disabling said decoding. 

28. A method according to any one of claims 24 to 

27, further comprising storing a program memory 
address for said instruction during said interrupting 
said repeated execution. 

29. A method according to any one of claims 24 to 

28, further comprising storing data associated with 
said repeated execution of said instruction during 
said interrupting of said repeated execution. 

30. A method according to any one of claims 24 to 

29, step (ii) further comprising noting and identifying 
an interrupt type. 

31. A method according to claim 30, wherein said 
another instruction comprises a program sub-rou- 
tine corresponding to said type of interrupt. 

32. A method according to any of claims 24 to 31 , 
further comprising restoring said instruction to said 
instruction decode register for an end of said inter- 
rupting said repeated execution. 

33. A telecommunications device comprising a dig- 
ital signal processor according to claim 22. 

34. A telecommunications device operable in ac- 
cordance with a method according to any of claims 
24 to 32. 

34. A wireless communication device comprising a 
telecommunication device according to claim 33 or 
claim 34, a user interface including a display, a key- 
pad or keyboard for inputting data to the communi- 
cations device, a transceiver and an antenna. 



(i) repeatedly executing an instruction wholly 
disposed in said instruction decode register, 

(ii) interrupting said repeated execution, 

(Hi) loading said instruction decode register with 

another instruction, and 

(iv) executing said another instruction. 
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